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Abstract 

More than a dozen papers analyzing the COBE data have now ap- 
peared. We review the different techniques and compare them to a 
"brute force" likelihood analysis where we invert the full 4038 x 4038 
Galaxy-cut pixel covariance matrix. This method is optimal in the 
sense of producing minimal error bars, and is a useful reference point 
for comparing other analysis techniques. Our maximum-likelihood es- 
timate of the spectral index and normalization are n « 1.15(0.95) and 
Q K, 18.2(21.3) /zK including (excluding) the quadrupole. Marginal- 
izing over the normalization Cg, we obtain n w 1.10±0.29 {n sa 
0.90 ± 0.32). When we compare these results with those of the various 
techniques that involve a linear "compression" of the data, we find that 
the latter are all consistent with the brute-force analysis and have error 
bars that are nearly as small as the minimal error bars. We therefore 
conclude that the data compressions involved in these techniques do 
indeed retain most of the useful cosmological information. 
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1 Introduction 



Since the first cosmic microwave background (CMB) anisotropics were de- 
tected by the COBE DMR experiment (Smoot et al. 1992), a plethora of 
different analysis techniques have been published. The aim of this paper is 
to compare them to a computationally cumbersome but statistically optimal 
method, to see how good they are. This is quite timely given the rate at 
which our data about the CMB sky is accumulating, since we wish to employ 
analysis techniques which are fast when faced with large data volumes and 
at the same time give error bars that are near the theoretical minimum. 
All published COBE analysis techniques have involved two steps: 

1. The full Galaxy-cut data set, consisting of say N = 4038 numbers, is 
by some clever form of "data compression" reduced to a smaller data 
set with N' < N numbers. 

2. Cosmological parameters are constrained by analyzing this reduced 
data set, usually by computing the likelihood function L(n,Q). 

The reason for doing the data compression is to speed up the calculations in 
2, which would otherwise involve repeated inversions of 4038 x 4038 matrices 
in the case of the COBE DMR experiment. The idea is that if the compres- 
sion method cleverly takes the physics into account, it mostly throws away 
noise and keeps the bulk of the cosmological information. 

Bond (1994, hereafter B95), Gorski et al. (1994a, 1994b, hereafter G94), 
and Bunn &; Sugiyama (1995, hereafter BS95) all use linear data compres- 
sion, where the reduced data is simply the original data vector multiplied 
by some matrix. Their reduced data sets contain 928, 957 and 400 numbers, 
respectively. In G94, the row vectors of the compression matrix are cho- 
sen to be those corresponding to the multipoles 1 = 2 — 30, weighted with 
account to pixel noise and orthogonalized. In BS95, the row vectors are in- 
stead those that arise from a certain eigenvalue problem. [This technique is 
described further in Bunn et a/.(1995) and White & Bunn (1995).] This data 
compression technique is often referred to as "expansion in signal-to-noise 
eigenfunctions" and is known in signal processing as the Karhunen-Loeve ex- 
pansion (Karhunen 1947). Given a prescribed "compression factor" N/N', 
it can be shown to be optimal in a certain sense (BS95). This method was 
independently introduced into cosmology in B95, where, after first reducing 
the data by a factor 4 by smoothing, the effect of compressing further with 

^The "pixel level 5" data set was used, where each pixel is obtained by averaging four 
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various N' is studied in detail. 

A second group of methods use quadratic data compression, where the 
reduced data are various quadratic combinations of the pixel values. Work in 
this category includes Smoot et al. (1992), Bennett et al. (1992), Scaramella 
& Vittorio (1993), Seljak & Bertschinger (1993), and Bennett et al. (1994), 
where the reduced data consist of bins of the observed correlation function, 
typically around 60. Wright et al. (1994b) and de Oliveira-Costa & Smoot 
(1995) use another set of quadratic statistics, the first 30 multipole moments 
of the data. Estimates of the quadrupole (Gould 1993) and the total pixel 
variance {AT/Tf (Banday et al. 1994, Wright et al. 1994a) also faU into 
the quadratic category. 

A third group of methods compress the data set by forming quantities 
that are higher-order combinations of the data than the above-mentioned 
linear and quadratic ones. This includes cubic quantities (Hinshaw et al. 
1994), correlation of extrema (Kogut et al. 1995) or topological information 
such as genus and spot morphology (Smoot et al. 1994, Torres 1994) as the 
reduced data. 

What do we mean by an analysis technique being good? Apart from 
the obvious requirement that it should give fairly unbiased estimates of the 
cosmological parameters, we want it to give as small error bars as possible. 
This is, of course, but one of several criteria one could choose to adopt. In 
particular, one could aim for a technique that was maximally robust to sys- 
tematic errors such as residual Galactic contamination, or for a technique 
that was numerically inexpensive to implement. One might also decide to 
prefer techniques which are as model-independent as possible. Adopting 
the latter criterion might lead one to look unfavorably on the technique of 
BS95, as that method involves the choice of a "fiducial power spectrum" 
in choosing the basis functions. However, the "brute force" technique de- 
scribed herein does fare well by this criterion, since it assumes only that the 
CMB fluctuations are Gaussian. We have chosen not to consider the effect 
of systematic errors or Galactic contamination in this paper, although the 
reader should of course bear in mind that the choice of the data analysis 
technique used to derive the cosmological parameters and their uncertainties 
may not be as significant as unmodeled systematic effects in the data, such 
as Galactic emission. We also chose to disregard computational complex- 
ity as a factor when chosing a data analysis method, in the spirit that the 
computational work in the data analysis step will in any case be negligible 

pixels from the 6144 pixel data set. 
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compared to the amount of effort already spent on collecting the data set. 
In summary, we focus entirely on minimizing the statistical error bars. 

The size of the resulting error bars clearly depends on the choice of data 
compression method. One would expect that the smallest error bars would 
result from an analysis in which no compression at all was performed. In 
the Appendix we show that all linear compression techniques give likelihood 
functions that are on average at least as broad as the likelihood function 
arising from an analysis done without compression, and it is reasonable to 
expect that the same is true of nonlinear techniques. For instance, a corol- 
lary of the Fisher-Cramer-Rao inequality (Fisher 1935; Kenney & Keeping 
1951 p. 373), well-known to statisticians, implies that if there exists a best 
unbiased estimate of the parameters, it will simply be the maximum likeli- 
hood estimate using all the data.^ 

We carry out this compression- free analysis in the present paper, to an- 
swer the question of how small these optimal error bars are. Once this is 
done, we can easily rank the other methods by checking how close to opti- 
mal their error bars are. This may be termed "the brute force approach" , as 
the intuitive simplicity of avoiding data compression comes at the expense 
of a significant increase in CPU time. However, as we discuss below, the 
computations are in fact not as time-consuming as one may think. Also, 
since the brute force approach does not require any time-saving approxi- 
mations, one can at no extra cost include additional elements of realism in 
the model. Thus we investigate the effect of correlated noise (Wright et al. 
1994a; Lineweaver et al. 1994) and the effect of the standard method for 
dipole removal. As has been pointed out elsewhere, the latter tends to bias 
the estimates towards higher values of n if not properly accounted for. 

In techniques based on linear compression, it is easy to account properly 
for the effects of monopole and dipole removal, either by applying the same 
projection operator to the data covariance matrix as was applied to the data, 
or by marginalizing over the unknown modes. The only way to remove the 
monopole and dipole bias from estimates based on nonlinear techniques is 
to perform Monte Carlo simulations.^ 

^ Asymptotically, i.e., in the limit of infinitely large sample size, the conclusion becomes 
even stronger: no other method, linear or not, can produce smaller error bars than a 
majcimum likelihood analysis using all the data. Since the COBE data contain a large 
number of independent data points — there are about 100 signal-to-noise eigenmodes 
with eigenvalue greater than 1 (BS95) — we expect the asymptotic limit to be a good 
approximation. 

* In general, since quantities like n and Q axe not linear in the data, no technique, linear 
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In the next section, we introduce some simplifying notation. In section 3, 
we describe how the pixel correlation is affected by removing the monopole, 
dipole and quadrupole. In section 4, we present our results, and in the 
discussion section we compare them to the previously published techniques. 

2 Notation 

As pointed out by numerous authors, COBE analysis is basically linear 
algebra, and both the equations and their interpretation tend to become 
simplified if we write the various quantities as vectors and matrices. 

Let us write the CMB sky map as the AT-dimensional vector x, defined 

by 

AT 

Xi = ^(fii), (1) 

where Bj is a unit vector in the direction of the i*'* COBE pixel. N = 6144 
for an all-sky map, and N = 4038 after a 20° Galactic cut. We write x as a 
sum of three terms, 

X = ya + £ + Zb, (2) 

which correspond to the contribution from cosmology, instrumental noise 
and "nuisance" multipoles, respectively, and will now be described in more 
detail. 

The N X oo-dimensional spherical harmonic matrix Y is defined as 

Yix = YUrii), (3) 

where wc have combined I and m into the single index X = l'^ + l + m + l = 
1,2,3,.... (Throughout this letter, we use real-valued spherical harmonics, 
which are obtained from the standard spherical harmonics by replacing e*™'^ 
by \/^smm(f), 1, v^costtk/; for m<0, m = 0, m>0 respectively.) Making 
the standard assumption that the CMB is Gaussian on COBE scales, a is 
an infinite-dimensional Gaussian random vector with zero mean and with 
the diagonal covariance matrix 

{axax') = 6xx'Ci, (4) 

or otherwise, is guaranteed a priori to give unbiased parameter estimates; the only way 
to be sure that a particular technique is unbiased is to perform Monte Carlo simulations. 
Almost all of the cited techniques have been tested for bias in this way. 
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the angular power spectrum Ci being specified by some cosmological model. 
Because of the addition theorem for spherical harmonics, this leads to the 
covariance matrix of the cosmological term in equation (^) being given by 

{Ya{Ya.)% = C{ni • (5) 

where the angular correlation function is defined as 

1 oo 

C{cose) = —Y,{2l + l)CiPi{cos9), (6) 

/=0 

Pi denoting the l^^ Legendre polynomial. 

The A^-dimensional noise vector e is assumed to be Gaussian with (e) = 
0. To an excellent approximation, its covariance matrix can be written 
as (niUj) = aiajC^"'\ni ■ rij), where is the rms noise of pixel i, and 
the dimensionless noise correlation function C^") has been computed by 
Lineweaver et al. (1994). To a good approximation, {niUj) = crfdij, but due 
to the beam-switching strategy used in the COBE DMR experiment, there 
are some minor corrections to this, primarily a correlation of the order of 
0.5% between pixels separated by 60°. 

The third term in equation (|2|), the "nuisance term", contains the un- 
known infiuence of the multipoles with I <lo- Since we lack accurate a priori 
knowledge of both the current CMB temperature and our Galaxy's peculiar 
velocity, we need to set ^ 1- Some authors are concerned about system- 
atic errors in the quadrupole and therefore set Iq = 2. The N x (Iq + 1)"^- 
dimensional matrix Z is simply the first [Iq + 1)^ columns of Y, and b is a 
(/q + l)^-dimensional constant vector whose value we a priori know nothing 
whatsoever about. Since the noise e and the cosmic signal a are uncorre- 
lated, the pixel covariance matrix 

M^(xx*)-(x)(x)* (7) 

is given by 

= C{ni ■ a,) + aia,C7(")(a, • fi,-)- (8) 

3 On monopole, dipole and quadrupole removal 

The standard way to eliminate the influence of the nuisance term is to "re- 
move" the monopole, dipole and perhaps quadrupole from the data x before 
performing an analysis. Although often not thought of in such a way, this is 
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in fact a linear operation, corresponding to multiplying the data by a certain 
matrix D. Let us define the matrix Z to be an orthonormalized version of 
Z, i.e., a matrix whose columns are orthonormal and span the same space 
as the columns of Z. This can be achieved either by a standard technique 
such as Gram-Schmidt orthogonalization or singular-value decomposition, 
or by simply defining Z = Z{Z^Z)~^/'^ and computing the square root by 
Cholesky decomposition. In either case, the N x N matrix ZZ*, which has 
rank (Zq + 1)^, acts as a projection operator onto the space spanned by the 
multipoles with I < Iq, and it is clear that the "removal matrix" D is given 
hy D = I — ZZ^. Thus defining the corrected data as 



it is readily seen that (x) = DZh = 0, so that the effect of the nuisance 
term has been eliminated. The covariance matrix for the corrected data is 
simply 



We wish to stress that the covariance matrix for the corrected data can 
in general not be described by a correlation function (Bennett et al. 1994; 
Wright et al. 1994b). After the correction, the correlation Mij between two 
pixels does not depend merely on the angle cosine between them, cos{6) = 
rij • rij, but also on the Galactic latitude of both pixels. In other words, 
the monopole and dipole removal breaks the rotational symmetry of the 
correlation function. This is due to the well-known fact that the Galactic cut 
destroys the orthogonality of the various multipoles, i.e., Y^Y ^ I. Hence 
when the monopole is removed, other multipoles are affected - primarily 
the m = components of the quadrupole and the hexadecupole. The dipole 
removal strongly affects the three components of the octupole that have 
\m\ < 1, etc. The situation is illustrated in Figure ||. 

Some of the first papers analyzing the COBE data computed the aver- 
age correlation between pixels in various bins of angular separation. For 
the 53-1-90 GHz 2 year data, this produces the wiggly line in Figure ||. 
Cosmological parameters were then fitted by comparing this to the theoret- 
ical, rotationally symmetric correlation function given by equation (^) (the 
dashed line). As we have seen, this is not quite the correct thing to do. The 
wiggly line should be compared with the theoretical prediction for the same 
quantity (heavy solid line), using equation ([lO|). Monte Carlo simulations 
{e.g. Seljak & Bertschinger 1993; Bennett et al. 1994) have shown that this 
effect leads to a small but non-negligible bias. 



X = L'x, 



(9) 




(10) 
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In addition, the figures show that the symmetry breaking is quite sub- 
stantial, the correlations for any given angular separation varying within 
the shaded region. As the correlation function method throws away this 
information about azimuthal dependence by averaging in bins for fixed 9, 
one might expect the resulting error bars to be slightly larger than optimal. 

For the reader with computational interests, we point out that the cor- 
rection of M is quite a rapid procedure. No N x N matrices need to be 
multiplied together, since 

M = DMD^ = M - {ZF^ + FZ^) + Z{Z^F)Z\ (11) 

where F = MZ. 

4 Results 

Assuming that the cosmic signal a is Gaussian (that it is a random vector 
whose probability distribution is a multivariate Gaussian), so is x. Thus 
given an observed data set x, the likelihood L is given by 

-21nL = lndetM-Fx*M~^i, (12) 

up to an uninteresting additive constant. For a spectrum due to Sachs- Wolfe 
fluctuations from power law density perturbations. 




and the likelihood L(n, Q) becomes a function of merely two parameters, 
the spectral index n and the quadrupole normalization Q. (Q is denoted 
Qrms-PS in many papers.) We have evaluated L numerically on a grid of 
points, and the resulting normalized likelihood is shown in Figure |. With 
Iq = 1 (monopole and dipole removed), the maximum likelihood estimate 
is {n,Q) = (1.15, 18.2/iK), and with Iq = 2 (quadrupole removed as well), 
it is {n,Q) = (0.95, 21.3/uK). The normalized marginal likelihood for n is 
plotted in Figure |3| (the shaded distribution), together with that obtained 
in G94 and BS95. As in those papers, we used the combined 53 and 90 
GHz maps (including also the 31 GHz map gave us almost no error bar 
reduction, as its noise level is so much higher), and a uniform prior. We use 
pixel by pixel minimum variance weighting when combining the data sets 
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for different channels and frequencies. In other words, a pair of data sets 
{x^, a'j} and {x'- , a'/} are combined into a new data set {xj, cjj} according to 
the formulas 



As expected, our brute force technique gives the smallest error bars, i.e., the 
narrowest distribution, corresponding to the highest peak. Also in agree- 
ment with our expectations, the other techniques are seen to be fairly close 
to optimal, with only marginally broader distributions. Our la confidence 
intervals are n = 1.10 it 0.29, Q = (20.2 it 4.6)/iK. For the former, we have 
followed G94 in marginalizing over the normalization at the "pivot point" , in 
our case Cg. Marginalizing over Q instead (which simply corresponds to us- 
ing a different Bayesian prior) makes only a minor difference: n = 1.07ib0.30. 
Conditioning on n = 1, Q = (20.3 it 1.5)//K. Choosing Iq = 2 (removing 
the quadrupole as well) is seen to yield a lower n-estimate, n = 0.90 it 0.32, 
just as reported by other authors (since the observed quadrupole is rela- 
tively small, it tends to favor power spectra with greater slope). Note that 
quadrupole removal also increases the error bars, as it amounts to throwing 
out a considerable amount of cosmological information. 

In computing the curves in Figure ^, we have assumed uncorrelated noise. 
We also computed the Iq = 1 curve using the 1st year noise correlation given 
by Lineweaver et al. (1994), but omit this from the plot as the difference 
is so small that it is hardly visible. The combined 2 year noise correlation 
is of course weaker still. In other words, uncorrelated noise is an excellent 
approximation. 

In contrast, the approximation that the correlations are not affected by 
monopole and dipole removal turns out not to be very good, which is hardly 
surprising in view of Figure ||. The shaded curve was computed by removing 
the monopole and dipole from the data and using the corrected covariance 
matrix given by equation (|lO[). The curve resulting from using the naive 
covariance matrix M instead is seen to peak further to the right, the ap- 
proximation causing a shift of about the same magnitude as the quadrupole 
removal, but in the opposite direction. We carried out Monte Carlo sim- 
ulations with {n,Q) = (1,20//K) fake skies, which verified that the latter 
approximation was biased high whereas the exact treatment appeared to be 
fairly unbiased. It is easy to understand the sign of this effect on physical 



(14) 




(15) 
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grounds: since the non-orthogonal multipoles couple mainly to /-values dif- 
fering by small even numbers, removing the monopole and dipole covertly 
removes parts of other low multipoles, which increases the slope of the best 
fit power spectrum. 

In linear techniques, this effect can be removed simply by proper treat- 
ment of the covariance matrix; in nonlinear techniques one must resort to 
Monte Carlo techniques. All of the analyses that quote estimates of n make 
some such attempt to account for potential bias. Bennett et al. (1994) have 
reported and removed just such a bias from their estimates. Smoot et al. 
(1994) check for bias with Monte Carlo simulations and find none. Wright 
et al. (1994b) use Monte Carlo simulations to compute the correct covari- 
ance matrix for their multipole estimates, and argue that this is sufficient 
to remove bias. 

We also tested an alternative approach to handling the bias from dipole 
subtraction, used in BS95. If one uses the uncorrected data x instead of x, 
the likelihood function becomes 

- 21nL = IndetM-F (x- Zb)*M-^(x- Zb), (16) 

where the unknown multipoles b remain as nuisance parameters. We com- 
puted the marginal distribution over (n, Q) by simply integrating L over 
all b. This integral can readily be done analytically, and is of course inde- 
pendent of the value of the unknown nuisance multipoles. The result was 
virtually indistinguishable from that of the other method, and is therefore 
not plotted. 

For readers with computational interests, we conclude this section with a 
few practical comments. The N xN matrix M is quite well-conditioned, due 
to the large noise contributions to its diagonal, and can be readily Cholesky 
decomposed. With an optimized code, only iV(iV-|- 1)/2 numbers need to be 
stored in RAM, as it is symmetric and the result can be computed in such 
an order as to gradually overwrite the original matrix. On a good (1994) 
workstation, this takes about ten minutes with A'^ = 4038. The matrix M, 
on the other hand, is singular, as it by construction has rank N — [Iq + 1)^. 
(The pixels are not independent when we know that their mean is zero, their 
dipole is zero, etc.) Thus the statistically correct thing to do is to throw away 
{Iq + 1)^ arbitrary pixels before the likelihood analysis. The resulting matrix 
inversion is not as well-conditioned as that for M, so whereas single precision 
suffices for M, double precision should be used here. As a stability test, we 
repeated the analysis with an additional 100 random pixels thrown away. 
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which greatly increases the condition number of the covariance matrix, and 
obtained virtually identical results. 

5 Discussion 

In this letter, we have carried out a "brute force" analysis of the two-year 
COBE DMR data which is optimal in the sense of giving the smallest pos- 
sible error bars. The results, n = 1.10 it 0.29, agree well with previous 
work, reinforcing the conclusion that the large-scale CMB fluctuations are 
consistent with the standard inflationary scenario. 
We draw the following conclusions: 

1. All the published linear methods of estimating the COBE power spec- 
trum (B95, G94, BS95) are close to optimal, in the sense of giving error 
bars near the theoretical lower limit. 

2. The routine removal of the monopole and dipole can introduce a bias, 
tending to lead to a slight overestimate of the spectral index n. In 
linear techniques, this problem can be simply dealt with by using the 
appropriate covariance matrix. When using the (quadratic) correlation 
function technique, either the theoretical correlation function should be 
corrected as shown in Figure ^, or Monte-Carlo simulation should be 
used to subtract the resulting bias. Published work using the latter 
approach agrees well with our results. 

3. It is well-known that the slight noise correlation reported by Lineweaver 
et al. (1994) has only a small impact on CMB results. We have eval- 
uated this impact numerically, by including the full noise-covariance 
matrix, and confirmed that the effect is negligible even at the high 
accuracy used here. 

4. The quadratic methods have the advantage of giving sharp constraints 
with very few (< 100) reduced data points, which also tend to be easy to 
interpret physically (correlation function, multipoles). One drawback 
is that their probability distribution is no longer Gaussian, and if a 
Gaussian approximation is made, the computation of their covariance 
matrix can be rather cumbersome (Seljak & Bertschinger 1993; Wright 
et al. 1994b; de Oliveira-Costa & Smoot 1995) compared to the linear 
case. 

5. The above techniques all assume Gaussian fluctuations. Although topo- 
logical methods may not be the most efficient way to constrain the 
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power spectrum, they will no doubt provide very interesting tests of 
the Gaussianity hypothesis as angular resolution improves. 

In the future, as much larger data sets may become available through pro- 
posed experiments such as the COBRAS/SAMBA satellite, the brute-force 
approach used here will hardly be feasible. It is thus encouraging that the 
faster methods reviewed give results that are so close to optimal. 

The authors wish to thank Kryzstof Gorski, Dag Jonsson, Douglas Scott, 
George Smoot, and Philip Stark for useful comments, Charley Lineweaver 
for providing the numerical noise correlation data, and Angelica dc Olivcira 
Costa for help with IDL plots. The COBE data sets were developed by 
the NASA Goddard Space Flight Center under the guidance of the COBE 
Science Working Group and were provided by the NSSDC. 
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Appendix 

In this Appendix we prove the statement that Hnear data compression can 
never give a likehhood function that is more sharply peaked, on average, 
than the hkelihood function of the uncompressed data. 

Let us begin by establishing some notation. Let x be an iV-dimensional 
vector representing the uncompressed data. We assume as usual that x is 
Gaussian distributed with zero mean, and we denote its covariance matrix 
(xx^) by Mx- Let A he axi N' x N "compression matrix" (with N' < N), 
and let y = Ax. be the iV'-dimensional compressed data vector. 

Since x is a Gaussian random vector, its likelihood function is given by 



where Aj, = — 2 In L^;. (Throughout this Appendix, we will find it convenient 
to denote explicitly by subscripts the random variable with respect to which 
we are computing likelihoods.) Suppose we are trying to estimate some 
parameter q. We wish to compute the "width" of the likelihood function, 
viewed as a function of q. We quantify this width as follows. Compute 
Lx as a function of q and find its maximum. Near the maximum, we can 
approximate In Lj, as a quadratic, and we see that the width of the peak is 
inversely proportional to the square root of the second derivative of InL^; 
evaluated at the peak. Our concern will be with the average width of the 
likelihood function, so we will be interested in the parameter 



where primes denote derivatives with respect to q and the derivative is eval- 
uated at the point where A^ = 0. (This is the same parameter that was 
adopted for the optimization problem in BS95.) We want to show that re- 
placing the full data vector x by the compressed data y can never cause 7 
to decrease. In other words, we want to prove that 



Aj; = x^M~^x + In det Mj, = 



Tr 



(m-^xx^ + IuM^) , (17) 



7. = (A^, 



(18) 



72/ ^ Tx- 



(19) 
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We begin by extending the matrix A to a nonsingular N x N matrix 
A as follows. Add N — N' rows to the bottom of A, making sure that 
the added rows are linearly independent of each other and of the rows of 
A, and are orthogonal to the rows of A with respect to the inner product 
(p, q) = pMj.q.0 Let y = ^x. y is an A^-dimensional vector whose first A^' 
elements are the elements of y. Let z be a vector consisting of the other 
N — N' elements of y: 



z 



(20) 



Since x and y are related via a nonsingular linear transformation, the 
likelihoods derived from them are the same, up to an overall multiplicative 
constant .0 Furthermore, because of the orthogonality condition we have 
imposed, the covariance matrix of y is block diagonal: 

The likelihood Ly therefore factors: 

Ly = LyLz (22) 

Since 7 is linear in InL, we know that 7y = 7y + 72- Since = 7y, all we 
have to do to prove the inequality (|l9|) is show that 7^ > 0. This follows 
immediately from the following relation, proved in BS95: 

7 = Tr ((M~^M')^) . (23) 

Since 7^ is the trace of the square of a matrix, it can never be negative. 



^ This is always possible. Since Mx is positive definite, the function (■, ■) is a perfectly 
good inner product, and we can therefore perform ordinary Gram- Schmidt orthogonaliza- 
tion to generate the extra rows. 

® This is easy to see by direct computation: the covariance matrix of y is My = AM^J^ , 
so Ay = y^Mj7V + IndetMy = {AM^A'^)''^ A:x. + \nAet{AMxA^) = x^AC^x + 

In det + 2 In det A = + 2 In det A. 
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Figure 1: How dipole removal alters the correlation function. 
The pixel correlation {xiXj) is plotted as a function of the angle between 
the two pixels. The dashed curve shows the naive correlation function cor- 
responding to n = 1, Q = 18/xK. The shaded region shows the range of 
correlations actually occurring after the monopole, dipole and quadrupole 
are removed outside of a 20° Galactic cut, the heavy solid curve showing the 
average correlation. The wiggly line is the correlation function naively ex- 
tracted from the 53-1-90 GHz 2 year COBE data. The bumps around 6 = 60° 
are due to the noise correlation reported by Lineweaver et al. (1994). 
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Figure 2: The normalized likelihood. 

The Bayesian probability distribution for the spectral index n and the nor- 
malization Q is plotted (bottom) using the combined 53 and 90 GHz two 
year COBE data and a uniform prior. The three contours (top) show the 
areas containing 68%, 95% and 99% of the probability, respectively. The 
"+" shows the maximum-likelihood estimate, (n, Q) ^ (1.15, 18.2/xK). 
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Figure 3: Marginal likelihoods for n. 
In order of decreasing peak height, the curves are obtained by the naive 
brute force method (not corrected for "dipole bias"), the brute force method 
(shaded), Bunn & Sugiyama 1994 (dashed), Gorski et al. 1994 (dotted) and 
the brute force method with quadrupole removed, respectively. All five 
curves are based on the combined 53 and 90 GHz two year COBE data and 
are marginalized over the normahzation at the "pivot point". All but the 
last curve include the quadrupole. 
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